1 



Statistical Performance Analysis of MDL Source 
Enumeration in Array Processing 

F. Haddadi*, M. Malek Mohammadi, M. M. Nayebi, 
and M. R. Aref 



Abstract — In this correspondence, we focus on the performance analy- 
sis of the widely-used minimum description length (MDL) source enumer- 
ation technique in array processing. Unfortunately, available theoretical 
analysis exhibit deviation from the simulation results. We present an 
accurate and insightful performance analysis for the probability of missed 
detection. We also show that the statistical performance of the MDL is 
approximately the same under both deterministic and stochastic signal 
models. Simulation results show the superiority of the proposed analysis 
over available results. 

Index Terms — Minimum description length (MDL), source enumera- 
tion, performance analysis, deterministic signal. 

EDICS Category: SAM-PERT, SAM-SDET 

I. Introduction and Preliminaries 

MDL [1], is one of the most successful methods for determining 
the number of present signals in array processing and channel 
order detection [2]. MDL is a low complexity information theoretic 
criteria which does not need any subjective threshold setting usual in 
detection theoretic criteria. Other statistical properties, specially its 
asymptotic consistency [1], makes it a favorable choice for source 
enumeration. Unfortunately, only few approximate finite-sample per- 
formance analysis are available on the MDL method [3]-[8]. In [3], a 
simple asymptotic statistical model for the eigenvalues of the sample 
correlation matrix was used. Unfortunately, the theoretical results 
showed persistent bias from the simulation results [4]. 

The next work [5], gives a computational approach for calculation 
of the probability of false alarm pf a . In calculating the probability 
of missed detection p m , the same inaccurate statistical model is used 
as in [3]. In [6], instead of exact performance estimation, theoretical 
bounds for performance were presented. A qualitative performance 
evaluation in terms of gap between noise and signal eigenvalues 
and also the dispersion of each group is given in [7]. In a recent 
work [8], a significantly different approach was used. Our simulation 
results show improved results of [8] in comparison with [3]. The 
performance analysis was generalized to the non-Gaussian signals 
while it was shown that the results reduce to the results of [5], [6] in 
Gaussian signals. We will show that the same modelling errors have 
degraded the analysis in [8] as in [3]-[6]. 

In this correspondence, we use an approach very similar to [3]-[5] 
to estimate p m , including in the analysis the finite sample C(n _1 ) 
biases of the eigenvalues. The noise subspace eigenvalue spread is 
taken into account which prevents the signal subspace eigenvalues 
to approach a 2 , the noise variance. The bias of the noise power 
estimator in MDL is calculated to get excellent match between 
theoretical and simulation results. We will not calculate pf a which 
is negligible. 

In the previous works, only the case of stochastic signal has 
been considered. Here, we use a perturbation analysis to calculate 
biases and variances of the eigenvalues under deterministic signal, 
too. Using these results, we show that the performance of source 
enumeration methods are approximately the same in both stochastic 
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and deterministic signal models. This is a natural complementary 
result for the known fact that the performance of the DOA (Direction 
of Arrival) estimation methods in array processing is the same under 
stochastic and deterministic signal models [9]. 

From a sensor array of L elements, n observations Xi G C ixl , i = 
1, . . . , n is made, which is a linear transformation of d < L source 
signals Sj 6 C 



plus noise Vi G C 

Xi = A(6)si + Vi 



(1) 



where A G C Lxd , the steering matrix, is composed of d linearly 
independent column vectors of array response a(0k), k = 1, . . . , d. 
Let X = [xi, . . . ,x n ] and S and V be defined in the same way. 
Signal and noise are assumed to be iid and uncorrelated random 
variables. A compact form for the model will be 



X = A(0)S + V. 



(2) 



Noise is assumed to be circular Gaussian. Signal can be modelled 
either as a zero-mean circular Gaussian random sequence or an 
unknown deterministic sequence. The distribution of x will be as 
Af(0, APA H + a 2 1) where P = E(ss H ) in the stochastic signal 
model, and as Af( As , a 2 1) in the deterministic signal model. 

To estimate the number of present signals d, eigenvalues of 
the correlation matrix R = n~ 1 E(X X H ) are used. Note that 
R det = n'USSV + a 2 1 and R sto = APA H + o 2 I. The 
eigendecomposition of the correlation matrix is 



Rv, 



XiVi 



(3) 



and we have Ai > ■ ■ ■ > A^ > Xd+i = • • • = Xl = cr , Source 
enumeration methods are based on a spherity test on the sample 
correlation matrix defined as 



1 ™ 

.Z~C — f x^x^ . 



(4) 



Eigendecomposition of R is defined as Rwi = Uwi in which l\ > 
h > •■■ > II- The MDL estimator of d is the minimizer of the 
following criterion 

A(d,L,n) = n(L-d)log( — ) + i d(2L - d) log(n) (5) 
\9dJ 2 



where 



A 1 
CLd = 



L-d 



i=d+l 



9^ n '! 



l/(L-d) 



(6) 



(7) 



The first term in l[5} is the generalized likelihood ratio for the test of 
spherity and the second term is a penalty function preventing over- 
modelling. 

II. Statistical Properties of Eigenvalues 

A. Signal Eigenvalues 

First of all, we derive a result useful for statistical characterization 
of the signal eigenvalues in the deterministic signal model. Let Xi G 
C Lxl , i = I, . . . , n be i.i.d. observations and Xi ~ Af(0, S ). Note 
that vec (X) ~ A/"(0 , I n ® S ), where <g) is the Kronecker product 
and vec(X) is the vectorizing operator stacking columns of a; in a 
single column vector. Let a, f3 ,7,C £ C Lxl be constant vectors. 
The Brillinger result states that [10, p. 114]: 



Cov(a H fl/3, 7 H flC) = n _1 (a H S7)(C H S/3). 



(8) 
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We generalize the Brillinger result to the nonzero-mean case. To the 
best of our knowledge the following result is new to the literature. 

Lemma 1: Let \ec(Y) ~ JV(vec(/x) , I n ® S), where /i = 
[/*!,.-.,/*„] and V 4 [ yi ,... 53/ J. Then for A = n^FY" 
and constant vectors a,/3,~f,C £ C ixl , we will have 



c = Cov(a H A/3, 7 H AC) 



n- 1 (a H S 7 )(C H S/3) 
- 2 (a H M M H 7)(C H S/3) 

-2/ 



(9) 
□ 



■n 2 (a H S 7 )(C H /xM H /3) 

Proof: See Appendix U 

We first briefly state useful available results. 

Theorem 1: Let vec (X) ~ A/"(0,J„ ® S). Then the signal 
eigenvalues of 1? in the asymptotic region of n > 1 has limiting 
Gaussian distribution and we have [10], [15] 



E(y = A* + £- 



+ ) 



(10) 



Cov(Ji,/j) = 5ijn _1 A 2 +0(n -2 ). (11) 
where <5,j is the Kronecker delta function. Now we generalize 
Theorem Q] to the non-central case. 

Theorem 2: Let vec(X) ~ A/"(vec(/x) , J n ® o 2 Il). Then asymp- 
totically for the signal eigenvalues of R we will have 



E(J 4 ) = Ai+J] 

Cov(/i, Zj) = 5ij 



(A, + Aj)a 2 



Proof: See Appendix ITT1 



7l(A; — \j) 

1 (2Ai<r 2 — a 4 



+ 0(rT 
+ 0(n 



(12) 

(13) 
□ 



S. Noise Eigenvalues 

The eigenvalues associated with the noise subspace come from 
a spherical subspace. Therefore, they are not sufficiently separated, 
but placed tight together around the noise power a . Then, the 
perturbation analysis in Appendix |n] is no longer true, since their 
eigenvectors change dramatically with a small perturbation in R. 
The distribution of the noise eigenvalues is identical to the noise- 
only observations in an L — d dimensional noise subspace with 
a small negative bias introduced by signal eigenvalues [11]. Here, 
we introduce two statistical distributions to show that some noise 
eigenvalues are considerably larger than a 2 . This invalidates the 
approximations used in [3] for calculating p m , In low SNRs, the 
weakest signal eigenvalue approaches the largest noise eigenvalue 
but cannot pass it due to the ordering of the eigenvalues. In this 
subsection, we assume a 2 — 1. 

1 ) The Marcenko-Pastur distribution: For sufficiently large n and 
L, with 7 = n/L and in the null case, the distribution of unordered 
noise eigenvalues is [11] 



9(l) = ^- l V(b-l)(l-a) : a<l<b (14) 

where a = (1 - 7" 1/2 ) 2 , b = (1 + 7~ 1/2 ) 2 , as depicted in Fig. 
[T] Note that g(l) is a univariate distribution since it expresses the 
bulk distribution [11] of the eigenvalues, i.e., in the null case, the 
eigenvalues of the covariance matrix are L independent samples of 
this distribution. 

2) The Tracy -Widom distribution: The largest eigenvalue of a com- 
plex correlation matrix in the null case has a bell-shaped distribution 
called i<2 with moments [11] 



E(h) ~ H n L — 1.8 CTni 
Std(Zi) ~ 0.9 a nL 



in which 



UnL 



1 + 



(15) 
(16) 

(17) 




Fig. 1. Limiting densities of the noise subspace eigenvalues for 7 = 1 and 
7 = 4 cases. The spread of the eigenvalues around 1 is evident. 



0~nL = 



fJ.nL 

n 



1/3 



(18) 



Let's see a numerical example. Assume n = 100 and L = 10, then 
E{h) ~ 1.55 and Std(Zi) ~ 0.09 which implies that h > 1.3 with 
high probability. We conclude that the signal eigenvalues should be 
well larger than a 2 . 

III. Probability of Missed Detection 

A. Method of Calculation 

In this subsection, using the statistical tools developed in the 
previous section, we calculate p m for MDL method, p f a is negligible 
in moderate values of n and L. For example, in L = 3 and n = 30, 
Pf a — 0.003 and decays rapidly when n and L increase. p m can 
be used to estimate the minimum energy level of a source to be 
detectable by the system. It can also be used to determine the system 
capability for resolving very close sources. Then, we concentrate on 
the p m i = p m {d — 1) and p m 2 — Pm(d = 2), although our method 
can be used for the general scenario. Let Hi denote the situation in 
which only one source is present 



Pmi = P (A(0, L, n) < A(l, L, n) \ #1) . 
Using l[5j and rearranging the terms in < | 1 9b we get 

Pmi =p(L log ~ ( L ~ 1) lo S (j^ 

< i-(2L-l)log(n)) 

By the definition of ad in ©, we can write 

1 



(19) 



1 , L 
ao= L h + — 



■ <>i 



Similarly, for the geometric mean using we have 

L , L-l 

So = h 9i 

Substituting {21} and {22j in $20$, we get [3] 

p»i=p(togQmx(£) <Ti) 

where 



and 



Qml(x)= + 

x \ Li 



Ti = — (2L-l)log(n) 



(20) 



(21) 



(22) 



(23) 



(24) 



(25) 
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In [3], The function log(Q m i (a;)) is approximated by its second 
order Taylor series near x = 1. This is one source of avoidable 
error in the method. The smallest eigenvalue of the signal subspace 
is greater than the largest eigenvalue of the noise subspace, which is, 
from subsection III-BI larger than a 2 . Also recall that ai ~ a 2 , we 
conclude that x > 1. It is evident that the function log(Q m i(a;)) is 
uniformly increasing in the region x > 1, therefore we can translate 
the inequality in d23b to a simpler one 



Pml 



■■ p (x < Tl 



where 



log(Q m i(T lx )) =Ti 



(26) 



(27) 



Using (26), two steps are required for calculation of p m , computing 
Ti x from (27} and determining the statistics of x = h/ai in (26). 

Unfortunately, (27} cannot be solved analytically for Ti x , then we 
find an approximate solution in the first step. Rearrange (27} to get 



Tix - 1 



T lx e 



(28) 



Expanding the left-hand-side of (28} to the second order, assuming 
L is sufficiently large and solving the resulting quadratic equation, 
gives a first approximation for T± x 



a) 



1 + \/e 2T i - 1 



(29) 



Now since the function in L.H.S. of (27} is smooth, we can use a 
first order Taylor series around the solution in (29} to get closer to 
the exact solution 



T& +1) = T£ + [T 1 -T} 



rp(i) _ -I 
- L lx 1 



(30) 



where T^ depends on T^J through (27}. Application of (30} for a 
few times gives a very accurate solution. Note that computation of 
Ti x is done after setting n and L, but is not dependent on the SNR. 

The next step in calculating p m i is determining the statistics of x. 
From (10} and {Q}, we can see that li is distributed as 



\ n(Xi — a 2 ) n J 



(31) 



In [3]-[5], [8], the bias term of h is not considered, while a numerical 
example can clarify the point. Assume that n — 100, L = 10, and 
a 2 = 1. In the SNR in which p m i starts to become large, Ai = 1.5, 
E(li) — 2.2, and Std(ii) = 0.15. Therefore, overlooking the bias 
term (0.7) introduces large error to the analysis. Since in the critical 
SNRs, the signal eigenvalue get closer to the noise eigenvalues, the 
denominator in l llOt reduces and the bias term gets large. 

In the null case, E(a ) = ~.E(Tr(.R)) = a 2 = 1, which 
recommends that E(ai\Hi) = a 2 . But a signal eigenvalue can 
cause a negative bias on oi, numerically about 2%. Then, although 
we neglect the variance of ai which is very small compared to the 
variance of l\, we should take into account the bias to achieve an 
exact performance evaluation. In fact, the variances of the eigenvalues 
(regardless of being a noise eigenvalue or a signal one) increases with 
the mean of the eigenvalue. This can be seen in the simulations and 
can be justified for the noise eigenvalues with noticing the decay of 
the Marcenko-Pastur distribution in Fig. Q] which results in increasing 
variance of its order statistics. The variance of any order statistic of 
a distribution is inversely proportional to the squared value of the 
distribution in the vicinity of the mean value of that order statistics. 
A classical example of this fact is the variance of the median. For 
the signal eigenvalues, this is already shown in i ll It and (T3}. This 
fact, along with the averaging in the calculation of ai shows that its 
variance is negligible in the analysis. To calculate the bias, note that 



E(h) + (L— l)-E(ai) = E(Tr(R)) = Tt(R) = Ai + (L - l)a 2 . 
This besides d 1 0b gives [16]: 



Hi : ai — o — 



<7 2 Al 



(32) 



n(Ai -a 2 ) 

Using (3T} and (32}, the distribution of x is determined as a Gaussian 
random variable with known mean fi x and variance a 2 . Then, p m i 
can be calculated as 



in which 



Pml = 1 



Q(t) 



Ti x — A*2 



1 _j£ J 
e 2 du. 



2tt 



(33) 



(34) 



The same procedure can be used to calculate p m2 . The following 
approximation is widely used and justified in the literature [3, eq. 
(24)], [5, eq. (n.3a)]: 



p m2 ~ p (A(l, L, n) < A(2, L, n) \ H 2 ) 



(35) 



It basically states that the probability of missing one of the sources 
is very larger than missing both of them. We drop the details and 
just give some of the points important in the calculation of p m 2'- 

Pm2 =p(logQ m2 (^) <T 2 ) (36) 

in which the threshold T 2 and the function Q m 2 are defined as 



= — (2L-3)log(n) 
Q m2 (s) = i(i + f-^- V ~ 

x — — 

C12 

The recursive equation to estimate the threshold T 2x will be 



The distribution of h will be 



(L-2)(T£>-1) 

^aa(a 2 + ^^_-^1-,M) 

\ n{\2 — u ) n{\i — A 2 ) nj 



(37) 
(38) 
(39) 

(40) 

(41) 



a 2 will have a negligible variance and can be estimated by its mean 
value: 



H 2 : E(a 2 ) = a 2 - ° Al 



(42) 



n(Ai — a 2 ) n(A 2 — a 2 ) 

Now, using J4U and (42}, the distribution of x in (39} can be found 
and p m 2 is achieved as in (33}. The same procedure can be used for 
determining p m in any number of sources. 

B. Deterministic Signal Model 

Although the first- and second-order statistical properties of the 
signal subspace eigenvalues are different under stochastic and de- 
terministic signal models, the performance of the MDL is the same 
under two models. As explained in section IIII-AI p m depends on 
the statistics of the weakest signal eigenvalue Id- We show that these 
statistics grow similar under two models when Id approaches the noise 
eigenvalues. Note that, for a fair comparison of the two signal models, 
the signal second-order characteristics should be the same (see e.g. 
[9, sec. V]). Therefore, we have linin-.oo SfetSfeJn — E(s sto s^ to ), 
which results in R& t = R sto and hence A^det = Aisto, i = 1, ■ . . , L. 
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In the situations where p m starts to grow large, Id is barely larger 
than the noise eigenvalues, A^ ~ a 2 , then from l !12t we have 



E(lddet) — Xd + 



a 2 X, 



i^d 



n(a 2 — Xi) 



(43) 



which is the same as l |10t in stochastic signal model. For the 
variances, we assume that Xd has approached the upper limit of the 
noise eigenvalues 



Xd 



(44) 



which is the upper limit of the Marcenko-pastur distribution in dl4t . 
Note that, as signal power reduces, its eigenvalue approaches the 
noise eigenvalues roughly about a 2 . But Xd cannot be smaller than 
the largest noise eigenvalue due to the sorting of the eigenvalues. 
Then as the SNR reduces, Xd approaches the upper limit of the noise 
eigenvalues about i44l . In fact, we are using a better approximation 
for Ad in calculating the variance in i44\ rather than in calculating 
the expectation in l |43l l. Assuming L<n, a first order expansion of 
i44i can be used in i ll It to give 



X d 



Var st0 (Id) 
and in J 1 3 b to give 

Var d et (Id) = - (2A d cr 2 - er 4 ) 



1 + 4 



(45) 



1 4 
— a 

n 



2 1 



- 1 



(46) 



which reduces to the result in J45t and we can conclude that the 
variance of Id is the same under two models in low SNRs. Hence, 
p m is approximately the same under two signal models. This is in 
harmony with the same result in the DOA estimation problem, where 
the performance of the estimators are the same under two signal 
model [9]. 

IV. Simulation Results 

In this section, simulation results are presented to support the 
theoretical derivations. We consider p m in different conditions of 
number of snapshots n, and number of sensors L in a Uniform Linear 
Array with half-wavelength inter-element distance. Our estimate is 
compared with [3] and [8]. Results are presented for two closely 
spaced sources in p m 2, and one source in p m i- When the sources 
get closer to each other, the weaker signal eigenvalue approaches the 
noise eigenvalues and possibly miss will occur. Therefore, for a fixed 
angular distance of the sources, a minimum SNR is required for the 
array to be able to detect both sources. 

Two equally powered uncorrelated signal sources in ±2° are 
assumed. The SNR is defined as the ratio of each signal variance 
to noise variance (i.e. sensor SNR). Figs [2] [3] and [4] show the 
corresponding results for p m 2 different situations in terms of n and 
L. Fig.[5]presents the results forp m i in the worst case of parameters. 
The superiority of our method in estimating the simulation results 
is evident. In Fig. [2] simulation results are presented for both 
deterministic and stochastic signals, which confirms the approximate 
equality of p m under two models. This equality improves as the 
number of observations n increases. Note that our method is used 
to estimate p m under stochastic signal model in Fig. [2] The analysis 
in [3] under-estimates p m with a horizontal distance of about 0.5-2 
dB. In fact, this method improves when n gets larger since in this 
situation, the neglected biases reduce. The estimate of [8] is better 
than [3], with over-estimation of p m equivalent with a horizontal 
distance about 0.5-1 dB. Note that in the extreme case of L — 32 and 
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Fig. 2. p m 2 of MDL method when number of sensors L = 10, and number 
of snapshots n = 100. 
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Fig. 3. p m 2 of MDL method when number of sensors L = 10, and number 
of snapshots n = 900. 




Fig. 4. p m 2 of MDL method when number of sensors L = 32, and number 
of snapshots n = 64. 
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SNRdB 

Fig. 5. 

Pml of MDL method when number of sensors L — 32, and number 
of snapshots n = 64. 

n = 64 of Fig. [4] our analysis starts to degrade since the asymptotic 
assumption is no longer valid. Though, in most cases, our estimate 
exhibits horizontal distance of about 0.03 dB. 

We have seen that the analysis in [3]-[5] lacks the inclusion of 
biases of the eigenvalues and also suffers from some inaccurate 
approximations. But the analysis in [8] requires more scrutiny since as 
we have seen in the simulation results, this analysis gives completely 
different results from [3]. Authors in [8] use asymptotic conditions 
to show that A(d — 1) — A(d) converges in distribution to a Gaussian 
random variable with mean /i and variance a 2 . Simulations show that 
although the formula derived for a 2 in [8] is a very good estimate 
of the empirical value, the same is not true for the mean n, which 
in fact shows considerable deviation. This disagreement is present 
in small n as well as large n conditions. The derived result for the 
mean of the Gaussian distribution in [8, eq. (19)] is 

/ „2 r -i . \ . 1 L-d+l \ 

fehrrkrd- 1 )] ) 

+ 0.5 (2d - 2L - l) log(n) (47) 

which we can see that is n log Q m d{x) plus some nonrandom term 
in the notation of our analysis. Now, it is evident that J47t is derived 
assuming E(U) = Ai for signal subspace and Eiaj) = a\, thus 
every biases in the distribution of U and is ignored. Additionally, 
Although we can assume the distribution of x to be Gaussian, it 
is not easy to assume normality for the function A(d — 1) — A(d) 
since it is a highly nonlinear function of x. Simulations show that the 
normality assumption is approximately valid only for large values of 
n, say n ~ 1000. Another issue is that nonlinearity of the function 
\og(Q m d(x)) move the mean of the distribution which is not taken 
into account. 

Here, we will give further simulation results that compare our 
analysis with the one presented in [8]. We assume the same conditions 
as in [8, Fig. 1] which is n = 900, L — 7, and two Gaussian sources 
in 6 — [—5° + 10°]. The results are shown in Fig. [6] where the 
experimental performance of MDL method is accurately predicted 
by both our method and the method presented in [8]. Although from 
a theoretical point of view, the method of [8] is not comprehensive 
enough, in this special case of parameters it works well. If we change 
the sources DOAs and keep every other parameters unchanged we 
will see that the predictions of [8] degrades. Figure [7] shows the 
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Fig. 6. Pm.2 of MDL method when number of sensors L = 7, and number 
of snapshots n = 900. The performance prediction method in [8] works well 
in this set of parameters. 




SNRdB 

Fig. 7. p m 2 of MDL method when number of sensors L = 7, and number 
of snapshots n = 900. The performance prediction method in [8] does not 
work well in this set of parameters. 

experimental results and theoretical predictions when sources are in 
= [-5° 20°]. It is evident that the method of [8] does not work 
well anymore while our method is still accurate. Note that we have 
investigated its performance when sources are very close to each other 
in our previous simulation results where the method in [8] failed 
to predict the performance accurately. Therefore, the method in [8] 
cannot be a reliable method of analytical performance calculation. 

V. Conclusion 

An accurate performance analysis for the probability of missed 
detection of the MDL source enumeration method was presented. 
Statistical characterization of the principal components of the co- 
variance matrix helped to take good assumptions and approximation 
which resulted in improved estimations of p m . It is proved that 
the performance is approximately identical under stochastic and 
deterministic signal models using a perturbation analysis which gives 
the statistical properties of eigenvalues in the deterministic signal 
model. Simulation results show the superiority of the proposed 
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analysis compared with the previous results. 



Appendix I 
Proof of LemmaQ] 

Let X = Y — fj, and rearrange the covariance in {9} a 

nc = Cov(a H A:X H /3 + a H /xX H /3 + a H X/i H /3 

H V vH> i H v Hj. . H v H>\ 

,7 XX C + 7 (J-X C + 7 X fi Q. 



(48) 



Circularity of the distribution and zero odd moments of zero-mean 
Gaussian distribution reduces {48} to 

nc = Cov(a H XX H [3 , 7 H II H () 
+ Cov(a H /xX H /3 , 7 H ^X H C) 
+ Cov(a H X/i H /3,7 H X/i H C). (49) 

The first term in ( |49l > is given by {8}. The fact that a;, J_ Xj : i 7^ j 
reduces the second term as 

aV^(^ H /3 C H ^)M H 7 = 
a% diag(_E(a;f C H Xi)) ii H ~f = 

(a H W H 7 )(C H S/3). (50) 

The third term in {49} can be derived in the same way. Note that all 
the three terms in the right-hand-side of {9} are C(n _1 ) since /x is 
of dimension L x n and hence /x/x H is 0(n). 



Appendix II 
Proof of Theorem[2] 

In the asymptotic region of n > 1, R is a slightly perturbed 
version of R, described as 



R = R + pA 



(51) 



where p < 1 is the perturbation factor. Small perturbations in 
R result in small changes in its eigenvectors if the associated 
eigenvalues are sufficiently separated [12]. It means that the following 
results are true for signal eigenvalues. Remember the definition of 
the eigendecompositions as Rvi — XiVi and Rwi = Uuii. The first 
order perturbation in eigenvectors is 



References 

[1] M. Wax and T. Kailath, "Detection of signals by information theoretic 
criteria," IEEE Trans. Acoustic Speech Signal Process., vol. ASSP-33, 
pp. 387-392, Apr. 1985. 

[2] A. P. Liavas, P.A. Regalia, and J. P. Delmas, "Blind channel approx- 
imation: Effective channel order determination," IEEE Trans. Signal 
Process., vol. 47, pp. 3336-3344, Dec. 1999. 

[3] H. Wang and M. Kaveh, "On the performance of signal-subspace 
processing - part I: narrow-band systems," IEEE Trans. Acoust. Speech, 
Signal Process., vol. ASSP-34, pp. 1201-1209, Oct. 1986. 

[4] M. Kaveh, H. Wang, and H. Hung, "On the theoretical performance 
of a class of estimators of the number of narrow-band sources," IEEE 
Trans. Acoust. Speech, Signal Process., vol. ASSP-35, pp. 1350-1352, 
Sep. 1987. 

[5] Q. Zhang, K. M. Wong, P. C. Yip, and J. P. Reilly, "Statistical analysis 

of the performance of information theoretic criteria in the detection of 

the number of signals in array processing," IEEE Trans. Acoustic Speech 

Signal Process., vol. 37, pp. 1557-1567, Oct. 1989. 
[6] W. Xu and M. Kaveh, "Analysis of the performance and sensitivity of 

eigendecomposition-based detectors," IEEE Trans. Signal Process., vol. 

43, pp. 1413-1426, June 1995. 
[7] A.P. Liavas, P.A. Regalia, "On the behavior of information theoretic 

criteria for model order selection" IEEE Trans. Signal Process., vol. 49, 

pp. 1689-1695, August 2001. 
[8] E. Fishier, M. Grossmann, and H. Messer, "Detection of signals by 

information theoretic criteria: general asymptotic performance analysis," 

IEEE Trans. Signal Process., vol. 50, pp. 1027-1036, May 2002. 
[9] B. Ottersten, M. Viberg, and T. Kailath, "Analysis of subspace fitting 

and ML techniques for parameter estimation from sensor array data", 

IEEE Trans. Signal Process., vol. 40, pp. 590-599, March 1992. 
[10] D. R. Brillinger, Time Series: Data Analysis and Theory. New York: 

Holt, Rinehart, and Winston, 1975. 
[11] I. M. Johnstone, "On the distribution of the largest eigenvalue in 

principal component analysis," Annals of Statistics, vol. 29, No. 2, pp. 

295-327, 2001. 

[12] G. H. Golub and C. F. Van Loan, Matrix Computations, The Johns 

Hopkins University Press, 1989. 
[13] M. Kaveh and A. J. Barabell, "The statistical performance of the MUSIC 

and the minimum-norm algorithms in resolving plane waves in noise," 

IEEE Trans. Acoust. Speech, Signal Process., vol. ASSP-34, pp. 331- 

341, April 1986. 

[14] J. H. Wilkinson, The Algebraic Eigenvalue Problem. New York: Oxford 

University Press, 1965. 
[15] D. Lawley, "Tests of significance for the latent roots of covariance and 

correlation matrices," Biometrika. vol. 43, pp. 128-136, 1956. 
[16] K. M. Wong, Q. Zhang, J. P. Reilly, and P. C. Yip, "On information 

theoretic criteria for determining the number of signals in high resolution 

array processing," IEEE Trans. Acoust. Speech, Signal Process., vol. 38, 

pp. 1959-1971, Nov. 1990. 



(52) 



where tijS are the perturbation coefficients. Straightforward calcula- 
tions will give [13, eq. (A.9)] [14]: 



= Aj + p vfA + Ujp 2 v^A 



vfAvi 



A, 



Under the conditions of Theorem [2] we will have 

(Ai + Afe) a 2 — <7 4 



Cov(tifc, t jr ) = Sij Skr 



np 2 (A s - A fc ) 2 



(53) 
(54) 

(55) 
a 2 1) in 



which is shown using i54i and replacing fj,fi H = n(R 
l[9}. Now, j 1 2b is proved using {53} and {9}. U3\ can be shown using 
{53} to the first order and {9}. Note that the limiting distribution of 
the eigenvalues is Gaussian [9]. 



